Assignment 04: Data Visualization — ggplot2 and Beyond

Author
Affiliation

Alona Nazarenko

Kyiv School of Economics

Data on Vehicles for 2025 🚗

As I’ve already visualized my previous dataset about freedom, I was looking for something new and interesting. During a lecture, Ihor Miroshnychenko showed us a website with open data for New York 🌆. That got me curious, so I searched the web and found a similar site for Ukraine 🇺🇦.

I easily found a dataset covering vehicles from 01.01.2025 up to the most recent data 30.09.2025. You can check it out on the DIA portal 📊.

This dataset is very large, containing 1,641,470 rows and 22 columns:

car_data <- read_csv2("reestrtz30.09.2025.csv")
car_data
# A tibble: 1,641,470 × 20
   PERSON REG_ADDR_KOATUU OPER_CODE OPER_NAME   D_REG DEP_CODE DEP   BRAND MODEL
   <chr>  <chr>               <dbl> <chr>       <chr>    <dbl> <chr> <chr> <chr>
 1 P      6310138200            254 НАЛЕЖНИЙ К… 01.0…    10000 OLD_… LEXUS IS 2…
 2 P      3222486201            254 НАЛЕЖНИЙ К… 01.0…    10000 OLD_… MG    350  
 3 P      3521485801            254 НАЛЕЖНИЙ К… 01.0…    10000 OLD_… SKODA SUPE…
 4 P      8000000000            254 НАЛЕЖНИЙ К… 01.0…    10000 OLD_… TOYO… RAV-…
 5 P      5120410100            254 НАЛЕЖНИЙ К… 01.0…    10000 OLD_… RENA… KANG…
 6 P      8036300000            254 НАЛЕЖНИЙ К… 01.0…    10000 OLD_… CHEV… ORLA…
 7 P      4623683701            254 НАЛЕЖНИЙ К… 01.0…    10000 OLD_… NISS… LEAF 
 8 P      8039100000            254 НАЛЕЖНИЙ К… 01.0…    10000 OLD_… RENA… MEGA…
 9 P      8038500000            254 НАЛЕЖНИЙ К… 01.0…    10000 OLD_… TOYO… PROA…
10 P      1210100000            254 НАЛЕЖНИЙ К… 01.0…    10000 OLD_… VOLK… TIGU…
# ℹ 1,641,460 more rows
# ℹ 11 more variables: VIN <chr>, MAKE_YEAR <dbl>, COLOR <chr>, KIND <chr>,
#   BODY <chr>, PURPOSE <chr>, FUEL <chr>, CAPACITY <dbl>, OWN_WEIGHT <dbl>,
#   TOTAL_WEIGHT <dbl>, N_REG_NEW <chr>
Column Name Description
person Type of vehicle owner: P = private person, J = legal entity
reg_addr_koatuu KOATUU code of the registration address (administrative territorial code).
oper_code Operation code indicating type of registration or action.
oper_name Description of the operation (e.g., “НАЛЕЖНИЙ КОРИСТУВАЧ…”).
d_reg Registration date of the vehicle.
dep_code Code of the department or registration office.
dep Name of the department or office.
brand Vehicle brand (e.g., TOYOTA, BMW).
model Vehicle model.
vin Vehicle Identification Number (unique identifier for the vehicle).
make_year Year the vehicle was manufactured.
color Color of the vehicle.
kind Vehicle type (e.g., car, truck, motorcycle).
body Body type (e.g., sedan, SUV, hatchback).
purpose Vehicle purpose (e.g., personal, commercial).
fuel Fuel type (e.g., petrol, diesel, electric).
capacity Engine capacity in cubic centimeters (cc).
own_weight Vehicle’s own weight (kg).
total_weight Vehicle’s total weight including load (kg).
n_reg_new Indicator of new registration (yes/no or code).

Data Preparation

Here, I make the dataset easier to work with by categorizing the operations and identifying the region codes where each operation was performed.

car_data <- car_data |> 
  clean_names() |> 
  mutate(d_reg = dmy(d_reg),
         category_oper = case_when(
           str_detect(oper_name, "ПЕРВИННА|(^|\\s)РЕЄСТРАЦІЯ") ~ "Registration",
           str_detect(oper_name, "ПЕРЕРЕЄСТРАЦІЯ") ~ "Re_registration",
           str_detect(oper_name, "ТИМЧАС") ~ "Temporary",
           str_detect(oper_name, "ЗНЯТТЯ З ОБЛІКУ|СКАСУВАННЯ ЗНЯТТЯ") ~ "Deregistration",
           TRUE ~ "Other"
         ),
         number_obl = str_extract(n_reg_new, "^[А-ЯІA-Z]{2}"))

Web Scraping 🕸️

For my analysis, having only the region code wasn’t enough — I also needed the region name.
So, I used web scraping to create a table that decodes the region codes from this site.

num_site <- read_html("https://avtonomera.net.ua/ua/korysna-informatsiya/kody-avtomobilnykh-nomeriv-diznatysya-kod-rehionu-za-nomerom-mashyny-nomery-rehioniv-ukrayiny.html?srsltid=AfmBOopXkW32murezYkxdKyhBHcr0kUIp52nKuczEcgmX4sDybAloNkm")

number_df <- num_site |> 
  html_nodes("table") |> 
  html_table(fill = TRUE)

number_df <- number_df[[1]] |>
  clean_names() |>
  slice(-1) |>
  select(c(region_ukraini, z_2004_roku_liternij_kod_zliva)) |>
  set_names(c("region_ukraini", "code")) |> 
  separate_rows(code, sep = ",\\s*") |>
  mutate(
    code = str_trim(code),
    code = str_replace_all(code, "I", "І")
  ) 

And also a big pain was the letter “І”, because it looks almost identical to the Latin “I” in some fonts, which caused issues when matching or merging data. I had to carefully normalize all the text to make sure the Ukrainian letters were correctly recognized.

Insights & Graphs 📊

The first thing I was curious about was which car brands Ukrainians prefer the most. To my surprise, it is Volkswagen.

Code
car_data |> 
  filter(category_oper %in% c("Registration", "Re_registration") & kind == "ЛЕГКОВИЙ") |> 
  count(brand) |> 
  slice_max(n, n = 10) |> 
  mutate(
    logo = file.path("logos", paste0(brand, ".png")),
         brand = fct_reorder(brand, n)
    ) |> 
  ggplot(aes(brand, n, fill = brand)) + 
  geom_col() +
  geom_image(aes(y = n + max(n) * 0.05, image = logo), size = 0.1) +
  geom_text(aes(y = n - max(n) * 0.18,
                label = scales::comma(n),
                color = brand), 
            hjust = 0, size = 4)+
  scale_fill_manual(
                     values = c(
    "#C4DDFF",  
    "#A0C4FF",
    "#7DA9FF",
    "#5C81F2",
    "#4B6ECF",
    "#3C5AA6",
    "#2E4A7D",
    "#23395B",
    "#1B263B",
    "#0D1B2A"   
  )) + 
  scale_color_manual(
    values = c(
      "#0D1B2A", 
      "#0D1B2A", 
      "#0D1B2A", 
      "#0D1B2A", 
      "#0D1B2A",
      "#C4DDFF", 
      "#C4DDFF", 
      "#C4DDFF", 
      "#C4DDFF", 
      "#C4DDFF"
    )) +
  scale_y_continuous(labels = comma) +
  coord_flip() +
  labs(
    title = "Top 10 Car Brands by Registrations",
    x = "Brand",
    y = "Number of Registrations"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    axis.text.y = element_text(face = "bold", hjust = 1),
    plot.title = element_text(face = "bold", hjust = 0.5),
    legend.position = "none"
  )

I also did a deeper analysis and found that the most popular car model is the Volkswagen Passat 2012, with a price range of about $9,000–$10,000. This may be influenced by the average Ukrainian salary and general affordability, which makes cars like the Volkswagen Passat 2012 more accessible.

Volkswagen Passat 2012

Next Prey 🐾

My next prey was the distribution of car types prowling our roads.

Code
year_graph <- car_data |> 
  filter((category_oper == "Registration" | 
            category_oper == "Re_registration")) |>
  mutate(kind = fct_reorder(kind, make_year, .fun = median)) |>
  ggplot(aes(x = reorder(kind, make_year, median), y = make_year, colour = kind)) + 
  geom_boxplot(outlier.alpha = 0.5) +
  scale_y_continuous(
    breaks = seq(min(car_data$make_year), max(car_data$make_year), by = 10)
  )+
  scale_colour_manual(values = c(
    "#000000",  
    "#1F5B89",  
    "#337AB7",
    "#F25C05",  
    "#FF9F1C",  
    "#004D4D",  
    "#009999",  
    "#FF4FA3",  
    "#C71585" ) 
    )+
  labs(
    title = "Distribution of Vehicle Production Years by Type in Ukraine🇺🇦",
    x = "Type of Vehicle",
    y = "Year of Manufacture"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(face = "bold", hjust = 0.5),
    legend.position = "none" 
  )

ggplotly(year_graph)

Most cars are up to 20 years old 🚗. The preference for older vehicles likely reflects affordability 💰, highlighting the importance of increasing income levels 📊.

A few unusual outliers caught my eye—cars over 1000 years old 🕰️🚘—which I couldn’t resist exploring 🔎.

Table 1: Outliers
brand model make_year color purpose fuel vin number_obl
ЗИЛ 130 1900 ЗЕЛЕНИЙ СПЕЦІАЛЬНИЙ БЕНЗИН АБО ГАЗ 00000000000000146 NA
ЗИЛ 130 1900 ЗЕЛЕНИЙ СПЕЦІАЛЬНИЙ БЕНЗИН 1021382 NA
HONDA LEAD 1900 СІРИЙ ЗАГАЛЬНИЙ БЕНЗИН JH1000AF481009931 ВА
ГАЗ 3307 1900 СИНІЙ ЗАГАЛЬНИЙ БЕНЗИН XTH330700P1452790 NA
АЗЛК 2140 1900 СИНІЙ ЗАГАЛЬНИЙ БЕНЗИН NA NA
PACKARD 180 1900 ЧОРНИЙ ЗАГАЛЬНИЙ БЕНЗИН 128213520 СЕ
SUZUKI LETS 1900 ЧЕРВОНИЙ ЗАГАЛЬНИЙ БЕНЗИН JS1000CA1KA129840 NA
PACKARD 180 1900 ЧОРНИЙ ЗАГАЛЬНИЙ БЕНЗИН 128213520 СЕ

Packard 180 Packard 180

From the next graph, we can clearly see that in 2025, most people are expected to register cars manufactured in 2008 🚗 — the year that stands out as the peak of registrations.

Code
car_data |> 
  filter((category_oper == "Registration" | 
            category_oper == "Re_registration")) |> 
  select(make_year) |> 
  ggplot(aes(make_year)) + 
  geom_density(fill = "#009999", alpha = 0.5) +  
  labs(
    title = "Distribution of Vehicle Make Year",
    x = "Make Year",
    y = "Density"
  ) +
  scale_x_continuous(
    limits = c(1980, max(car_data$make_year)),     
    breaks = seq(1980, max(car_data$make_year), 5) 
  ) +
  theme_minimal(base_size = 14) +
  geom_text(
    data = peak,
    aes(make_year, density_year, label = (round(make_year))),
    vjust = - 1.8,
    color = "#004D4D",
    check_overlap = TRUE, 
    fontface = "bold",
    size = 4.5
  )
Warning: Removed 3276 rows containing non-finite outside the scale range
(`stat_density()`).

🌱 Ecology

I also decided to explore how eco-friendly our country is this year 🌍.
Below is a map of Ukraine 🇺🇦 showing the number of electric ⚡ cars in each region.

Code
electro_graph <- ggplot(ukraine_sf) +
  geom_sf(aes(
    fill = electro,
    text = glue("Region: {NAME_1}
    Electric: {electro}
    Total registrations: {total}
    Share: {round(electro / total * 100, 1)}%")
    ), color = "black") +
  scale_fill_gradientn(
    colours = c("#F7FEE7", "#A3E635", "#65A30D", "#3F6212", "#1A2E05"),
    na.value = "grey90",
    name = "Electric Cars"
  ) + theme_minimal(base_size = 14) +
  labs(title = glue("\U000026A1 Electric Mobility Across Ukraine")) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", color = "#14532D"),
    plot.caption = element_text(hjust = 0.5, face = "italic", color = "#14532D"),
    panel.background = element_rect(fill = "#F8FFF8", color = NA)
  )
Warning in layer_sf(geom = GeomSf, data = data, mapping = mapping, stat = stat,
: Ignoring unknown aesthetics: text
Code
ggplotly(electro_graph, tooltip = "text") |> 
  layout(
    hoverlabel = list(
      bgcolor = "#F7FEE7",   
      bordercolor = "#65A30D",
      font = list(color = "#3F6212", size = 13)
    )
  )

Registrations via DIA portal: 1589
Total electric car registrations: 91661

Here we can see the top leaders in electric car adoption ⚡:

  • 🏙️ Kyiv
  • 🏞️ Lviv
  • 🌊 Dnipro

It is sad to say 💔, but Kharkiv no longer has the same capacity as before, and all frontline regions ⚠️ are similarly affected.

In contrast, the western regions 🌄 are making much greater progress in adopting electric vehicles 🚗🔋.
This is yet another consequence of the war ⚔️.